A Cohesion-based Approach for Unsupervised Recognition of Literal and Nonliteral Use of Multiword Expression

نویسندگان

  • Linlin Li
  • Caroline Sporleder
  • Manfred Pinkal
  • Werner Saurer
چکیده

Texts frequently contain expression whose meaning is not strictly literal, such as idioms. Idiomatic and non-literal expressions pose a major challenge to natural language processing technology as they often exhibit lexical and syntactic idiosyncrasies. We propose a novel unsupervised method for distinguishing literal and non-literal usages of expressions. Our method determines how well a literal interpretation of the expression is linked to the overall cohesive structure of the discourse. If only weak cohesive links can be found, the expression is classified as idiomatic. We propose two methods to model the cohesive links in our task: the lexical-chain-based approach and the cohesion-graph-based approach. While the chain-based approach is effective at distinguishing literal and non-literal usage, it is sensitive to chaining algorithms, parameter settings and data setup. We further develop the chain-based approach into a graph-based approach in order to overcome these problems. This development makes our cohesion-based approach unsupervised while maintaining a high performance. Thesis Supervisor: Caroline Sporleder Title: Dr. Thesis Supervisor: Manfred Pinkal Title: Prof.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions

We present a graph-based model for representing the lexical cohesion of a discourse. In the graph structure, vertices correspond to the content words of a text and edges connecting pairs of words encode how closely the words are related semantically. We show that such a structure can be used to distinguish literal and non-literal usages of multi-word expressions.

متن کامل

Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings

Non-compositional multiword expressions (MWEs) still pose serious issues for a variety of natural language processing tasks and their ubiquity makes it impossible to get around methods which automatically identify these kind of MWEs. The method presented in this paper was inspired by Sporleder and Li (2009) and is able to discriminate between the literal and non-literal use of an MWE in an unsu...

متن کامل

A Clustering Approach for the Unsupervised Recognition of Nonliteral Language

In this thesis we present TroFi, a system for separating literal and nonliteral usages of verbs through unsupervised statistical word-sense disambiguation and clustering techniques. TroFi distinguishes itself by redefining the types of nonliteral language handled and by depending purely on sentential context rather than selectional constraint violations and paths in semantic hierarchies. TroFi ...

متن کامل

A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language

In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed sets acquired and cleaned without human s...

متن کامل

Acquiring Multiword Verbs: The Role of Statistical Evidence

In addition to words and grammar, young children learn a large number of multiword sequences that are semantically idiosyncratic and have particular syntactic behaviour, e.g., expressions formed from the combination of a verb and a noun, such as take the train and give a kiss. Given the high degree of polysemy of verbs that commonly participate in such constructions, an important question is wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008